Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and their Annotated Corpora

نویسندگان

  • Carmen Dayrell
  • Arnaldo Candido
  • Gabriel Lima
  • Danilo Machado
  • Ann A. Copestake
  • Valéria Delisandra Feltrim
  • Stella E. O. Tagnin
  • Sandra M. Aluísio
چکیده

The relevance of automatically identifying rhetorical moves in scientific texts has been widely acknowledged in the literature. This study focuses on abstracts of standard research papers written in English and aims to tackle a fundamental limitation of current machine-learning classifiers: they are mono-labeled, that is, a sentence can only be assigned one single label. However, such approach does not adequately reflect actual language use since a move can be realized by a clause, a sentence, or even several sentences. Here, we present MAZEA (Multi-label Argumentative Zoning for English Abstracts), a multi-label classifier which automatically identifies rhetorical moves in abstracts but allows for a given sentence to be assigned as many labels as appropriate. We have resorted to various other NLP tools and used two large training corpora: (i) one corpus consists of 645 abstracts from physical sciences and engineering (PE) and (ii) the other corpus is made up of 690 from life and health sciences (LH). This paper presents our preliminary results and also discusses the various challenges involved in multi-label tagging and works towards satisfactory solutions. In addition, we also make our two training corpora publicly available so that they may serve as benchmark for this new task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interfaces of Macro and Microstructure in Academic Writing: The Case of Research Article Abstracts

Abstract Although flourishing research has been devoted to research on article abstracts, more studies are needed to unpack the relationship between rhetorical moves and their associated linguistic and rhetorical features (e.g., metadiscourse). To underpin this relationship, the current study analyzed a total of 60 research article abstracts written in English by two cultural groups in three di...

متن کامل

A Rhetorical Move Analysis of TEFL Thesis Abstracts: The Case of Allameh Tabataba’i University

Abstract in every research paper has always been functioning as an attention-grabber which can encourage readers to keep reading the research or to dissuade it. Although abstracts are believed to play an important role in distributing the research findings, few studies have been done to evaluate the rhetorical organization of thesis abstracts, especially in the field of Teaching English as a Fo...

متن کامل

An Analysis of English and Persian Academic Written Discourses in Human Sciences: An Evolutionary Account

The present paper focused on the sociocultural explanations of rhetorical differences between English and Persian and was based on the contrastive genre analysis of Applied Linguistics research article abstracts in these two languages. The evolutionary nature of research article abstracts was also investigated from 1985 to 2005, in three stages, with a time interval of 10 years. Seventy eight r...

متن کامل

Genre Analysis of ELT and Nursing Academic Written Discourse through Introduction

Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...

متن کامل

Medical Research Article Introductions in Persian and English Contexts: Rhetorical and Metadiscoursal Differences

Medical discourse has recently attracted much scholarly attention. However, few studies have concentrated on both the overall rhetorical structure of the research article (RA) and the specific lexicogrammatical features of the texts, particularly English-Persian contrastive studies on medical RAs. Relying on Nwogu’s (1997) framework, the present study aimed at providing a macroanalysis of the I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012